-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Add url_encode function #133494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Add url_encode function #133494
Conversation
🔍 Preview links for changed docs |
da1095a
to
1c7be9e
Compare
@ConvertEvaluator() | ||
static BytesRef process(final BytesRef val) { | ||
String s = val.utf8ToString(); | ||
String encoded = URLEncoder.encode(s, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two concerns I have about this snippet:
- This actually over-encodes the input. For example
http://elastic.co
becomeshttp%3A%2F%2Felastic.co
. I wonder if we have a better option internally. - Performance when passing strings around, as I see all other evaluators directly manipulating byte arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. I see all other evaluators directly manipulating byte arrays.
Many don't. We'd prefer if they did, but sometimes that's life.
I think the over-encoding is a bigger deal - we should figure out what the folks who asked for this expect from it. If you add this as a SNAPSHOT function then we have a lot of latitude to change it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://elastic.co
becomeshttp%3A%2F%2Felastic.co
From what I read in the RFC, those are reserved characters and "should" be encoded. So I would say that it's fine (?)
@ConvertEvaluator() | ||
static BytesRef process(final BytesRef val) { | ||
String s = val.utf8ToString(); | ||
String encoded = URLEncoder.encode(s, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. I see all other evaluators directly manipulating byte arrays.
Many don't. We'd prefer if they did, but sometimes that's life.
I think the over-encoding is a bigger deal - we should figure out what the folks who asked for this expect from it. If you add this as a SNAPSHOT function then we have a lot of latitude to change it later.
...est/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/UrlEncodeTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
...est/java/org/elasticsearch/xpack/esql/expression/function/scalar/convert/UrlEncodeTests.java
Outdated
Show resolved
Hide resolved
@ConvertEvaluator() | ||
static BytesRef process(final BytesRef val) { | ||
String s = val.utf8ToString(); | ||
String encoded = URLEncoder.encode(s, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://elastic.co
becomeshttp%3A%2F%2Felastic.co
From what I read in the RFC, those are reserved characters and "should" be encoded. So I would say that it's fine (?)
x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/string.csv-spec
Outdated
Show resolved
Hide resolved
4282895
to
a3f01fb
Compare
docs/reference/query-languages/esql/_snippets/lists/string-functions.md
Outdated
Show resolved
Hide resolved
...sql/src/main/java/org/elasticsearch/xpack/esql/expression/function/EsqlFunctionRegistry.java
Outdated
Show resolved
Hide resolved
a3f01fb
to
b24117f
Compare
c6fb600
to
25545d4
Compare
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/kibana-esql (ES|QL-ui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too.
25545d4
to
2d3cdcc
Compare
The failing release-test (https://buildkite.com/elastic/elasticsearch-pull-request/builds/88528/steps/canvas?jid=0198f0c9-4928-4cf3-9939-0a867d857f5f) is unrelated to these changes. Removing |
- Rename the capability to exclude the decoding portion, since this PR is only concerned with encoding. - Mark url_encode as SNAPSHOT to allow changes in encoding logic. - Add tests that use random strings to test a wide range of characters, that run alongside the ones that use random URLs. - Minor change in the sample csv test string to showcase a fictional URL with path and few query params.
- Remove url_encode from docs since it's a snapshot function - Declare it along other snapshot functions in EsqlFunctionRegistry - Annotate UrlEncode similarly to other snapshot functions - Add url_encode capability to csv tests
2d3cdcc
to
7572fc5
Compare
- Add url_decode as a snapshot function. - Move common logic to both encode/decode to a parent class
- Add url_decode as a snapshot function. - Move common logic to both encode/decode to a parent class
- Add url_decode as a snapshot function. - Move common logic to both encode/decode to a parent class
* ESQL: Add url_encode function
- Add url_decode as a snapshot function. - Move common logic to both encode/decode to a parent class
BASE=37f65b0bf8eb0f75ea696ad136eac0bd50005330 HEAD=7572fc5993ae95ef569fd2283666951cb387ff78 Branch=main
BASE=37f65b0bf8eb0f75ea696ad136eac0bd50005330 HEAD=7572fc5993ae95ef569fd2283666951cb387ff78 Branch=main
BASE=37f65b0bf8eb0f75ea696ad136eac0bd50005330 HEAD=7572fc5993ae95ef569fd2283666951cb387ff78 Branch=main
BASE=37f65b0bf8eb0f75ea696ad136eac0bd50005330 HEAD=7572fc5993ae95ef569fd2283666951cb387ff78 Branch=main
url_encode
function